Support for Speculative Execution in High- Performance Processors

نویسنده

  • Michael David Smith
چکیده

Superscalar and superpipelining techniques increase the overlap between the instructions in a pipelined processor, and thus these techniques have the potential to improve processor performance by decreasing the average number of cycles between the execution of adjacent instructions. Yet, to obtain this potential performance benefit, an instruction scheduler for this high-performance processor must find the independent instructions within the instruction stream of an application to execute in parallel. For non-numerical applications, there is an insufficient number of independent instructions within a basic block, and consequently the instruction scheduler must search across the basic block boundaries for the extra instruction-level parallelism required by the superscalar and superpipelining techniques. To exploit instruction-level parallelism across a conditional branch, the instruction scheduler must support the movement of instructions above a conditional branch, and the processor must support the speculative execution of these instructions. We define boosting, an architectural mechanism for speculative execution, that allows us to uncover the instruction-level parallelism across conditional branches without adversely affecting the instruction count of the application or the cycle time of the processor. Under boosting, the compiler is responsible for analyzing and scheduling instructions, while the hardware is responsible for ensuring that the effects of a speculatively-executed instruction do not corrupt the program state when the compiler is incorrect in its speculation. To experiment with boosting, we built a global instruction scheduler, which is specifically tailored for the non-numerical environment, and a simulator, which determines the cycle-count performance of our globally-scheduled programs. We also analyzed the hardware requirements for boosting in a typical load/store architecture. Through the cycle-count simulations and an understanding of the cycle-time impact of the hardware support for boosting, we found that only a small amount of hardware support for speculative execution is necessary to achieve good performance in a small-issue, superscalar processor. . d Phrases. computer architecture, instruction scheduling, superscalar processor, trace-driven simulation

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speculative Multithreaded Processors

Architects of future generation processors will have hundreds of millions of transistors with which to build computing chips. At the same time, it is becoming clear that naive scaling of conventional (superscalar) designs will increase complexity and cost while not meeting performance goals. Consequently, many computer architects are advocating a shift in focus from high-performance to high-thr...

متن کامل

An Architecture & Mechanism for Supporting Speculative Execution of a Context-full Reconfigurable Function Unit

Recently researchers have shown interest in integrating Reconfigurable logic into conventional processors as a Reconfigurable Function Unit (RFU). A context-full RFU supports holding intermediate results inside itself, which eliminates some data movement overheads and has some other benefits. Most contemporary processors support out-of-order execution and speculation. When a context-full RFU is...

متن کامل

Efficient Exception Handling Techniques for High-performance Processor Architectures

Providing precise exceptions has driven much of the complexity in modern processor designs. While this complexity is required to maintain the illusion of a processor based on a sequential architectural model, it also results in reduced performance during normal execution. The existing notion of precise exceptions is limited to processors based on a sequential architectural model and there have ...

متن کامل

Memory Latency-Tolerance Approaches for Itanium Processors: Out-of-Order Execution vs. Speculative Precomputation

The performance of in-order execution ItaniumTM processors can suffer significantly due to cache misses. Two memory latency tolerance approaches can be applied for the Itanium processors. One uses an out-of-order (OOO) execution core; the other assumes multithreading support and exploits cache prefetching via speculative precomputation (SP). This paper evaluates and contrasts these two approach...

متن کامل

Performance and power comparison of Thread Level Speculation in SMT and CMP architectures

As technology advances, microprocessors that support multiple threads of execution on a single chip are becoming increasingly common. Improving the performance of general purpose applications by extracting parallel threads is extremely difficult, due to the complex control flow and ambiguous data dependences that are inherent to these applications. Thread-Level Speculation (TLS) enables specula...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1992